SLIQ: A Fast Scalable Classifier for Data Mining

نویسندگان

  • Manish Mehta
  • Rakesh Agrawal
  • Jorma Rissanen
چکیده

Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classifier and presents the design of SLIQ’ , a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-fist tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact aad accurate trees. The combination of these techniques enables SLIQ to scale for lerge data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sliq: a Fast Scalable Classiier for Data Mining

Classiication is an important problem in the emerging eld of data mining. Although classiication has been studied extensively in the past, most of the classiication algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classi-er and presents the design of SLIQ 1 , a new classiier...

متن کامل

SLIQ: A Fast Scalable Classi er for Data Mining

Classi cation is an important problem in the emerging eld of data mining. Although classi cation has been studied extensively in the past, most of the classi cation algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classier and presents the design of SLIQ, a new classi er. SL...

متن کامل

CC-SLIQ: Performance Enhancement with 2k Split Points in SLIQ Decision Tree Algorithm

Decision trees have been found to be very effective for classification in the emerging field of data mining. This paper proposes a new method: CC-SLIQ (Cascading Clustering and Supervised Learning In Quest) to improve the performance of the SLIQ decision tree algorithm. The drawback of the SLIQ algorithm is that in order to decide which attribute is to be split at each node, a large number of G...

متن کامل

SLEAS: Supervised Learning using Entropy as Attribute Selection Measure

There is embryonic importance in scaling up the broadly used decision tree learning algorithms to huge datasets. Even though abundant diverse methodologies have been proposed, a fast tree growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential to a greater extent. This paper aims at improving the performance of the SLIQ (Supervised Le...

متن کامل

An Approach to Automation Selection of Decision Tree based on Training Data Set

In Data mining applications, very large training data sets with several million records are common. Decision trees are very much powerful and excellent technique for both classification and prediction problems. Many decision tree construction algorithms have been proposed to develop and handle large or small training data. Some related algorithms are best for large data sets and some for small ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996